This project aims to investigate the relation of police incident records, time, and location in San Francisco, California during 2018-2020, through analysis using R Markdown with packages, including readr, dplyr, ggplot2, ggrepel, and leaflet. Generally speaking, an obvious decrease since early 2020 can be observed in Assault, Larceny Theft, Lost Property, Non-criminal, Other Miscellaneous, Robbery, and Warrant. However, Burglary, Motor Vehicle Theft, and Recovered vehicles have experienced some increase since early 2020. Crimes are mostly reported during afternoon hours, especially around 5-8 PM weekdays and around noon all days, and there are many crimes reported around 12 AM during weekends.In conclusion, the factor that affects when the crime or arrest happens the most is the crime category, while other factors such as month and police precinct do not have much impact on when the crimes happen.
This project aims to investigate the relation of police incident records, time, and location in San Francisco, California during 2018-2020 based on the San Francisco police open data. Since crime has been a major social concern of urban areas when these unlawful activities happened and reported can be important to San Francisco as a major city of U.S. Although the reason why the crime happened at a certain time will not be revealed in this project, assumptions may be provided. The dataset contains police records in San Francisco since 2018 but only data from 2018 to 2020 will be included in the related analysis. Some records with too many missing critical values will be automatically removed from the analysis. These police records are used as an indicator of unlawful activities, which may be referred to as crime in the following.
What’s the trend of the daily police records in San Francisco? Which areas have crime happened more frequently than others? Which types of crime frequently happen? Is there a variation between different kinds of crimes? What time does theft happened frequently? What are the possible factors? What are the possible factors for when the crime happened and how they correlated? Is there a correlation between the number of incidents and the crime category? Is there a correlation between the number of incidents and the police districts?
The original column names are a bit redundant. This step is to simplify the column names that may be used in the following analysis.
library(tidyverse)
library(dplyr)
df <- df %>% rename(Date=`Incident Date`,
Time = `Incident Time`,
Year = `Incident Year`,
DayOfWeek = `Incident Day of Week`,
Category=`Incident Category`,
Descript=`Incident Description`,
PdDistrict = `Police District`,
Y = Latitude,
X = Longitude) %>%
mutate(Time = as.character(Time))
# str(df)
The graph indicates the daily police records number decreased dramatically during 2020. (probably because of the Covid-19, everyone just stayed at home) The months with the lowest daily number of police records are around March to April 2020 when the government had released the quarantine announcement. Generally, the number of daily police records after 2020 is much lower than the previous.
### Interactive Map of Crime Incidents This map shows the locations of crime incidents during 2018-2020. Clicking on pop-up icons on the map can show incident details. There’s a lot of incidents in the Tenderloin and Mission, as well as in Bayview and Fisherman’s Wharf. The legendary Tenderloin district has the highest concentration of crimes, which spreads north of it to Downtown. Once passing Market St. to the south and the US101 to the west, the number of incidents decreases dramatically. Another noteworthy pattern is that incidents decrease as going east or north from the Tenderloin towards the Nob Hill and Financial District districts, but increases when reaching the waterfront.
Summarize the data by incident category. According to the list of the most frequent record categories, Larceny Theft takes about 30% of all records, being the category with the largest percentage. The top 20 frequent categories are, (starting from the most frequesnt), Larceny Theft, Other Miscellaneous, Malicious Mischief, Non-criminal, Assault, Burglary, Motor Vehicle Theft, Recovered Vehicle, Warrant and Lost Property.
## # A tibble: 20 x 3
## Category Frequency Percentage
## <chr> <int> <dbl>
## 1 Larceny Theft 142622 30.2
## 2 Other Miscellaneous 34758 7.35
## 3 Malicious Mischief 31074 6.57
## 4 Non-Criminal 29055 6.14
## 5 Assault 28308 5.99
## 6 Burglary 26585 5.62
## 7 Motor Vehicle Theft 21894 4.63
## 8 Recovered Vehicle 17141 3.62
## 9 Warrant 15508 3.28
## 10 Lost Property 14690 3.11
## 11 Fraud 14235 3.01
## 12 Drug Offense 11484 2.43
## 13 Robbery 11071 2.34
## 14 Missing Person 10616 2.25
## 15 Suspicious Occ 9440 2.00
## 16 Disorderly Conduct 8023 1.70
## 17 Offences Against The Family And Children 6602 1.40
## 18 Traffic Violation Arrest 5526 1.17
## 19 Miscellaneous Investigation 4456 0.942
## 20 Other Offenses 4009 0.848
This pie chart shows the percentage that each record category takes in all records.
The following is a bar plot of incident categories with high frequency. Record number by date and category: According to the graph, Burgary had a peak around May 2020 and it has increased a bit since early 2020; Larceny Theft has the largest variation and it has had the similar dramatic decrease to the all incident records trend since early 2020; Lost Property also has expereinced some decrease since earily 2020; Warrant had a few peaks during 2019. Generally speaking, obvious decrease since early 2020 can be observed in Assault, Larceny Theft, Lost Property, Non-criminal, Other Miscellaneous, Robbery and Warrant. However, Burglary, Motor Vehicle Theft and Recovered Vehicle have expereinced some increase since early 2020.
After exploring the dataset, analyze the general trends of when the crimes happened, and the relation between crime and when by time and day of week.
According to the previous list, Larceny Thefts happen most often, so let’s explore more about this category.
Trend of the number of Larceny thefts by date: As the graph shows, theft frequency drops dramatically during 2020, which is similar to the trend of daily crime numbers.
To further understanding of when theft happened most often, hour and day of the week are analyzed.With a heat map of days of week and hours, readers can see which day of the week and hours of day has the highest incident records, and contrast that to other time intervals at a quick glance.
This necessitates the separation of the Hour and Day-of-Week columns: although the datasets has a day of week column, extraction of the Hour part from the Time column is still required. To extract the hour, strsplit function is used. Day of week and hour values are reordered using factor functions.
The heatmap indicates that there is a larger number of thefts that happened between 6 PM to 7 PM during weekdays.
The same heatmap developed for all crime incident records:
To minimize the effect of crimes with low frequencies, create heatmap for crime categories with over 20000 records. According to Fig. 8, most incidents are mostly at 12 AM during weekends, or 12 PM and 5-7 pM during weekdays.These are the hours when the day-shift workers are likely to be on break or clocked-out.
The following graphs aims to explain the relation between Time/Day of the Week and incident records
To further discuss crime category as a factor of time of crime reported, display the heatmap by crime category: Some criminal activities, like prostitution, might presumably occur predominantly at night. Faceting is a technique in ggplot2 that allows such analysis easier by producing a graph for each case of a separate value in a different variable. In just this scenario, a heatmap shows each of the top categories of incidents using ggplot2, then check if there is any meaningful change in the heatmap.
As shown in Fig. 9, the data has too many Larceny Theft records to show the time distribution of other categories, thus, the data need normalization. After implementing normalized percent inhibition, Fig. 10 shows some interesting patterns. Burglary mostly happens around 5 PM on Friday, after midnight around 2-3 AM, while other incident categories are least likely to happen around 2-3 AM. A high percentage of assault incidents are around 12-2 AM during weekends. Drug Offense mostly happens during 2-4 PM on Tuesdays and 1-3 PM on Wednesdays. Fraud is the incident category that has the most records at around 12 AM and 12 PM, which could be because that the fraud could have a large number of victims but a small number of suspects, and the police can input each victim as an individual record. A larger number of Larceny theft happened between 6 PM to 7 PM during weekdays. Lost property has most records around 12 AM on Fridays, Saturdays and Sundays, and around 12 PM on Saturdays. There is a high percentage of Motor Theft incidents at around 5 PM on Fridays or around 5-6 PM on weekdays.The recovered vehicle incidents records are around 9-12 AM on Mondays and Tuesdays. In general, most incidents are around daytime with some special categories happen more often during midnight.
Same as above, but with Police Districts: Fig 11 displays the time and day of the week of incident records in each police district. most of them are around daytime on weekdays and around midnight on weekends. There are no significant different patterns among different police districts.
If crime is tied to activities, the period of the year at which special activities, such as holiday seasons, may impact. Months are reorder using the factor function. However, according to Fig.12, each month has similar patterns to each other, which means there is likely little relation between the month and when the incidents happen.
In case that things changed over the years may affect when these incidents happen, the year is also analyzed as a factor. In 2020, there are more incident records at around midnight, but fewer during the late afternoon during weekdays, compared to weekdays in 2018 and 2019.
Based on the graphs above, in general, there is a smaller number of incidents happening during the nighttime during weekdays than on weekends. Most incidents happen during the afternoon on weekdays. crime category can be an important factor of when crime happens since various crime categories show different patterns of time of incidents. Crimes may be affected by year if there is a disaster lasting over a year, such as Covid-19. However, when crimes happen may not have a relation with the month or police districts.